[HIPDNN] Tolerance lookup followup#8752
Draft
bghimireamd wants to merge 19 commits into
Draft
Conversation
…oading Bundle discovery scans integration_test_bundles/ for .meta.json files, derives GTest names from folder structure, and eagerly loads graph + tensor data. IntegrationTestBundle models bundles as facets with optional golden outputs. BundleRegistration wires discovered bundles into GTest. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…model Verification modes (auto|golden|gpu|cpu) control what engine output is compared against. The harness extracts golden outputs before execution, runs references via ReferenceCapabilityError-aware adapters, and reports unverifiable bundles. Includes CLI flags, env fallbacks, and unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Graph-only bundles without input data get their inputs synthesized per-op via free functions dispatched by node attribute type. RoleAccounting enforces that every owned leaf input is accounted for (FREE/STRUCTURED/DERIVED) — unaccounted inputs refuse the bundle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename FillOutcome → SynthesisResult and RoleAccounting → SynthesisTracker for clarity: SynthesisTracker tracks per-node input role declarations, SynthesisResult reports whether synthesis succeeded. - Rename RoleAccounting.hpp → SynthesisTracker.hpp to match the class name. - Expand input synthesis from 3 ops to all 19 supported attribute types (conv fwd/bwd/wrw, batchnorm inference/variance/training/backward, matmul, pointwise, reduction, layernorm fwd/bwd, rmsnorm fwd/bwd, resample, block-scale dequantize/quantize, sdpa fwd/bwd). - Remove single-node graph restriction — harness now loops over all nodes in a graph, supporting fused graphs (e.g. conv+bias+relu). - Improve comments throughout for reviewer clarity: explain what "owns" means, document the 4-step fill pattern, add per-function comments explaining why specific inputs are STRUCTURED or DERIVED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move all method implementations out of the 831-line header-only harness into a .cpp file (~700 lines), leaving the header as declarations only (~210 lines). The harness is an internal test executable with virtual methods — no reason for it to be header-only. Add 9 unit tests for SynthesisTracker covering: all-FREE success, undeclared inputs, STRUCTURED/DERIVED refusals, uid-0 skip, non-owned uid skip, empty owned set, mixed failures, and factory methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Make the harness's mode dispatch testable by reading the verification mode through a protected virtual rather than directly from the TestConfig singleton. Add 10 unit tests covering every branch: auto mode with and without golden data, GPU→CPU fallback chain, golden/gpu/cpu explicit modes, and capability-miss skip paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…clang-tidy - Track gpuRefErrored in runAutoMode() so the skip message distinguishes GPU-ref-errored from GPU-ref-capability-miss when CPU ref also misses. - Extract resolveVerificationMode() and resolveGoldenDataDir() as free functions (CLI > env > nullopt precedence) and add 6 unit tests. - Fix readability-implicit-bool-conversion in SynthesizeInputs.hpp: change all if(!ptr) to if(ptr == nullptr). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously each fill function created its own SynthesisTracker with the whole-graph leaf input UIDs, so finish() rejected any UID not declared by that single node. For multi-node/fused graphs (e.g. conv+relu), the second node's finish() saw the first node's UIDs as "no role declared" and returned unsupported → SKIP. Fix: create one SynthesisTracker in synthesizeInputs(), pass it through all fill functions, call finish() once after all nodes have declared their UIDs. Each node now only accounts for its own inputs, and the final finish() verifies that the union covers all leaf inputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-optional, runEngineOrSkip - Fill output tensors with NaN sentinel instead of zero so allClose's NaN guard catches unwritten elements (previously masked as computed zero) - Mark FP8 descale/scale factors as STRUCTURED (refuse to fabricate random values that cause identical saturation → vacuous pass), matching BlockScaleDequantize precedent - Make metadata mandatory only when golden outputs are present; graph-only bundles default-construct empty metadata instead of failing with MISSING_METADATA - Extract runEngineOrSkip() helper to deduplicate engine-run preamble across all three verification modes - Thread engine error message into skip reason (was silently discarded) - Make applyMetadataGuards() virtual so unit tests can override the hardware-guard path without initializing TestConfig singleton - Fix test name lint failures (suite/case naming convention) - Add SynthesisTracker precondition documentation - Add synthesizeInputs() phase documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ref path - Remove uid!=0 guard from SynthesisTracker::isOwned() — uid=0 is a valid tensor uid when the graph declares it as a real leaf input. The "absent optional" convention (uid 0 = not connected) is the fill function's responsibility, not the tracker's. - Guard ref-path device allocation on _requiresDevice so CI runners without a GPU don't trigger std::bad_alloc from rawDeviceData(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ghimire/ALMIOPEN-1970/golden_reference_fallback
deriveTestName() threw when a graph .json sat at the data root with no
folder above it. That blocked the natural "--gd .../graph_only_bundle"
invocation, where the user points the data root straight at a single
bundle's folder. Use the root folder's own name as the suite name in
that case instead of throwing, so the bundle is discovered as
{folder}.{stem}.
Update the unit test (JsonAtRootThrows -> JsonAtRootUsesFolderNameAsSuite)
to assert the new behavior.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract shared TomlGuards.hpp and wire all three harnesses through it. Fix toleranceForNodeAttributes missing SdpaAttributes/SdpaBackwardAttributes. Add SynthesisTracker::fillComputed + tensorAt for follow-up init unification. Rename appendTensorDiff/appendFpDiff to writeTensorDiffReport/writeFpDiffReport. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-init unification branch) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ards calls Both harnesses now use the same shared free functions (skipIfTomlMatched, applyTomlToleranceOverride) from TomlGuards.hpp. The TOML lookup logic is already unit-tested in TestTestSettings.cpp. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Harness unification (ALMIOPEN-1969 follow-up): - Share input synthesis between both harnesses: move SynthesisTracker and SynthesizeInputs to harness/input_init/ (namespace hipdnn_integration_tests). The non-golden harness now drives its initializeBundle() through the same synthesis switch, with a random [-1,1] fallback. Remove 4 of 7 non-golden initializeBundle overrides whose ranges/seeds now match the shared fill functions; 3 remain (fused-graph range conflicts and a large-values stress test). - Rename harness/golden/ -> harness/bundle/ and namespace hipdnn_integration_tests::golden -> ::bundle. "golden" was inaccurate: the harness verifies bundles via golden data OR a GPU/CPU reference, and most bundles ship no golden output. Rename the class IntegrationGraphGoldenReferenceVerificationHarness -> IntegrationBundleVerificationHarness (parallel to its sibling IntegrationGraphVerificationHarness). Fix stale "golden" log/report strings that mislabeled all bundles or shared comparison paths. The golden verification *mode* and golden output *data* vocabulary, and the --golden-data-dir CLI flag, are retained (accurate / external). - Drop redundant hipInit/hipGetDevice/_deviceId from the graph harness: HIP initializes lazily and getSharedHandle()->hipdnnCreate() already does it before any graph runs; _deviceId was write-only. - Split bundle metadata guards: the VRAM check applies to every bundle the engine runs, but the GPU-arch lock only matters when comparing against golden output values (arch-specific). Gate the arch check on hasGoldenOutputs so an inputs-only bundle verified against a local reference is not wrongly skipped on a different arch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both harnesses resolved comparison tolerance with duplicated, divergent
code. Replace both with one shared resolver
(harness/tolerance/ToleranceResolver.hpp) where the aggregation policy is a
plain function (GraphWrapper, dtype) -> float, selectable per caller.
What changed:
- New resolveTolerance(wrapper, dtype, testName, atol, rtol, policy=max):
derives a default via the chosen aggregation policy, then applies the TOML
per-test override (highest priority) in ONE place. Single tolerance entry
point for both harnesses; the override is no longer applied separately in
either, so the layering order lives here alone.
- Two policies, each a free function (no enum/switch; the policy IS the
function, C++17 function pointer):
* maxAcrossNodes — loosest per-node tolerance; conservative envelope,
never tighter than any node so it cannot cause a false failure.
* outputOpTolerance — tolerance of the last non-Pointwise op (the
output-producing op); reproduces the graph harness's historical
getTolerance() behavior.
- Wiring preserves existing behavior on both sides:
* Graph harness getTolerance() -> outputOpTolerance (its prior policy),
now expressed through the shared resolver instead of a private
dynamic_cast switch. It also serializes via to_binary() and reads the
output dtype from the flatbuffer, so it shares the resolver keyed on the
same representation the bundle harness uses.
* Bundle harness -> maxAcrossNodes (its prior policy), unchanged.
The two policies agree for the common one-real-op + activation case
(activation is Pointwise -> skipped; the single real op is both loosest and
last), so they differ only on multi-real-op fusions, which carry explicit
overrides today.
- Deletes the duplicate per-op tolerance switch that existed in each harness
(one on dynamic_cast of live nodes, one on the flatbuffer NodeAttributes
enum); both now share one flatbuffer-keyed lookup.
- warnIfMultipleOutputs() guards the single-output assumption: every current
policy reduces over the whole graph, not the per-output subgraph, so
multi-output graphs are flagged loudly (deferred fix, ALMIOPEN-2216).
Why output-op for the graph harness (not max): a mechanical migration of the
C++ graph tests should not change their tolerances. outputOpTolerance keeps the
exact prior numbers. Note this is a heuristic, not a principled tight bound
(it attributes the whole output tolerance to one op); it is kept for migration
parity. The principled tighten path is the future DynamicTolerances upgrade
(see TODO in ToleranceResolver.hpp and ALMIOPEN-2216); FP4 also needs sub-bf16
entries in the dtype switch (currently falls through to 1e-3).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
67c0a2b to
e8ff12f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist